StyleGAN has shown strong potential for disentangled semantic control, thanks to its special design of multi-layer intermediate latent variables. However, existing semantic discovery methods on StyleGAN rely on manual selection of modified latent layers to obtain satisfactory manipulation results, which is tedious and demanding. In this paper, we propose a model that automates this process and achieves state-of-the-art semantic discovery performance. The model consists of an attention-equipped navigator module and losses contrasting deep-feature changes. We propose two model variants, with one contrasting samples in a binary manner, and another one contrasting samples with learned prototype variation patterns. The proposed losses are defined with pretrained deep features, based on our assumption that the features can implicitly reveal the desired semantic structure including consistency and orthogonality. Additionally, we design two metrics to quantitatively evaluate the performance of semantic discovery methods on FFHQ dataset, and also show that disentangled representations can be derived via a simple training process. Experimentally, our models can obtain state-of-the-art semantic discovery results without relying on latent layer-wise manual selection, and these discovered semantics can be used to manipulate real-world images.
translated by 谷歌翻译
Covid-19-Pandemic继续在世界上迅速传播,并在全球人类健康和经济中造成巨大危机。它的早期检测和诊断对于控制进一步的扩散至关重要。已经提出了许多基于学习的深度方法,以帮助临床医生根据计算机断层扫描成像进行自动COVID-19诊断。但是,仍然存在挑战,包括现有数据集中的数据多样性,以及由于深度学习模型的准确性和敏感性不足而导致的检测不满意。为了增强数据多样性,我们设计了增量级别的增强技术,并将其应用于最大的开放式基准测试数据集Covidx CT-2A。同时,在本研究中提出了从对比度学习中得出的相似性正则化(SR),以使CNN能够学习更多参数有效的表示,从而提高了CNN的准确性和敏感性。七个常用CNN的结果表明,通过应用设计的增强和SR技术,可以稳定地提高CNN性能。特别是,具有SR的Densenet121在三个试验中的三类分类中达到99.44%的平均测试准确性,包括正常,非covid-19-19-19肺炎和Covid-19-19。 COVID-19肺炎类别的精确度,敏感性和特异性分别为98.40%,99.59%和99.50%。这些统计数据表明,我们的方法已经超过了COVIDX CT-2A数据集上现有的最新方法。
translated by 谷歌翻译
本文介绍了一项有关离线增强学习中依赖间隙依赖样品复杂性的系统研究。先前的工作显示了何时最佳策略和行为策略之间的密度比上限(最佳策略覆盖范围假设),则代理可以实现$ o \ left(\ frac {1} {\ epsilon^2} \ right)$ rate,这也是最小值的最佳。我们在最佳策略覆盖范围假设下显示,当在最佳$ q $ unction中存在积极的子临时差距时,可以将费率提高到$ o \ left(\ frac {1} {\ epsilon} \ right)$。。此外,我们显示了行为策略的访问概率何时在最佳策略的访问概率为正(统一的最佳策略覆盖范围假设)的状态下,均匀下降,识别最佳政策的样本复杂性独立于$ \ frac {1} {\ epsilon} $。最后,我们呈现几乎匹配的下限,以补充我们的间隙依赖性上限。
translated by 谷歌翻译
冠状病毒2019年对世界产生了重大影响。一种防止人们感染的一种有效策略是在公共场所佩戴面具。某些公共服务提供商只有在正确佩戴面具时才需要客户使用他们的服务。然而,只有一些关于自动面罩检测的研究。在本文中,我们提出了RetinAfaceMask,第一高性能单级面罩探测器。首先,解决现有研究没有区分正确和错误的掩码佩戴状态的问题,我们建立了一个包含这些注释的新数据集。其次,我们提出了一种上下文注意模块,专注于学习与面罩佩戴状态相关的区别特征。第三,我们从面部检测任务转移了知识,灵感来自人类通过学习从类似的任务学习如何改善他们的能力。消融研究表明提出的模型的优点。公共和新数据集的实验结果表明了我们模型的最先进的表现。
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
To generate high quality rendering images for real time applications, it is often to trace only a few samples-per-pixel (spp) at a lower resolution and then supersample to the high resolution. Based on the observation that the rendered pixels at a low resolution are typically highly aliased, we present a novel method for neural supersampling based on ray tracing 1/4-spp samples at the high resolution. Our key insight is that the ray-traced samples at the target resolution are accurate and reliable, which makes the supersampling an interpolation problem. We present a mask-reinforced neural network to reconstruct and interpolate high-quality image sequences. First, a novel temporal accumulation network is introduced to compute the correlation between current and previous features to significantly improve their temporal stability. Then a reconstruct network based on a multi-scale U-Net with skip connections is adopted for reconstruction and generation of the desired high-resolution image. Experimental results and comparisons have shown that our proposed method can generate higher quality results of supersampling, without increasing the total number of ray-tracing samples, over current state-of-the-art methods.
translated by 谷歌翻译
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
translated by 谷歌翻译
Representing and synthesizing novel views in real-world dynamic scenes from casual monocular videos is a long-standing problem. Existing solutions typically approach dynamic scenes by applying geometry techniques or utilizing temporal information between several adjacent frames without considering the underlying background distribution in the entire scene or the transmittance over the ray dimension, limiting their performance on static and occlusion areas. Our approach $\textbf{D}$istribution-$\textbf{D}$riven neural radiance fields offers high-quality view synthesis and a 3D solution to $\textbf{D}$etach the background from the entire $\textbf{D}$ynamic scene, which is called $\text{D}^4$NeRF. Specifically, it employs a neural representation to capture the scene distribution in the static background and a 6D-input NeRF to represent dynamic objects, respectively. Each ray sample is given an additional occlusion weight to indicate the transmittance lying in the static and dynamic components. We evaluate $\text{D}^4$NeRF on public dynamic scenes and our urban driving scenes acquired from an autonomous-driving dataset. Extensive experiments demonstrate that our approach outperforms previous methods in rendering texture details and motion areas while also producing a clean static background. Our code will be released at https://github.com/Luciferbobo/D4NeRF.
translated by 谷歌翻译
Deploying reliable deep learning techniques in interdisciplinary applications needs learned models to output accurate and ({even more importantly}) explainable predictions. Existing approaches typically explicate network outputs in a post-hoc fashion, under an implicit assumption that faithful explanations come from accurate predictions/classifications. We have an opposite claim that explanations boost (or even determine) classification. That is, end-to-end learning of explanation factors to augment discriminative representation extraction could be a more intuitive strategy to inversely assure fine-grained explainability, e.g., in those neuroimaging and neuroscience studies with high-dimensional data containing noisy, redundant, and task-irrelevant information. In this paper, we propose such an explainable geometric deep network dubbed as NeuroExplainer, with applications to uncover altered infant cortical development patterns associated with preterm birth. Given fundamental cortical attributes as network input, our NeuroExplainer adopts a hierarchical attention-decoding framework to learn fine-grained attentions and respective discriminative representations to accurately recognize preterm infants from term-born infants at term-equivalent age. NeuroExplainer learns the hierarchical attention-decoding modules under subject-level weak supervision coupled with targeted regularizers deduced from domain knowledge regarding brain development. These prior-guided constraints implicitly maximizes the explainability metrics (i.e., fidelity, sparsity, and stability) in network training, driving the learned network to output detailed explanations and accurate classifications. Experimental results on the public dHCP benchmark suggest that NeuroExplainer led to quantitatively reliable explanation results that are qualitatively consistent with representative neuroimaging studies.
translated by 谷歌翻译
Domain adaptation methods reduce domain shift typically by learning domain-invariant features. Most existing methods are built on distribution matching, e.g., adversarial domain adaptation, which tends to corrupt feature discriminability. In this paper, we propose Discriminative Radial Domain Adaptation (DRDR) which bridges source and target domains via a shared radial structure. It's motivated by the observation that as the model is trained to be progressively discriminative, features of different categories expand outwards in different directions, forming a radial structure. We show that transferring such an inherently discriminative structure would enable to enhance feature transferability and discriminability simultaneously. Specifically, we represent each domain with a global anchor and each category a local anchor to form a radial structure and reduce domain shift via structure matching. It consists of two parts, namely isometric transformation to align the structure globally and local refinement to match each category. To enhance the discriminability of the structure, we further encourage samples to cluster close to the corresponding local anchors based on optimal-transport assignment. Extensively experimenting on multiple benchmarks, our method is shown to consistently outperforms state-of-the-art approaches on varied tasks, including the typical unsupervised domain adaptation, multi-source domain adaptation, domain-agnostic learning, and domain generalization.
translated by 谷歌翻译